Sequence Labeling
Core Concept
Sequence labeling is the task of assigning a label to each element of an input sequence so that the output is a sequence of labels of the same length. The input is typically a sequence of tokens (e.g. words, characters, or time steps) and the output is a corresponding sequence of tags (e.g. part-of-speech, named-entity type, chunk type). Dependencies between labels are often important—adjacent or nearby tags tend to follow valid patterns (e.g. IOB constraints in NER)—so models usually capture local or long-range structure rather than predicting each position independently. This makes sequence labeling a core structured-prediction problem with applications in NLP, speech, and time-series analysis.
Key Characteristics
- One label per position – Output length equals input length; each token (or frame) gets exactly one label. This distinguishes sequence labeling from segmentation where boundaries are predicted, though encoding schemes like BIO merge the two (labels encode both segment type and position).
- Label dependencies – Labels are not independent: valid tag sequences obey constraints (e.g. I-after-B for the same type in BIO; grammatical tag sequences). Models use Markov assumptions (e.g. bigram), linear-chain CRFs, or recurrent/attention networks to capture these dependencies.
- Decoding – Finding the best label sequence given the model is done with Viterbi (for linear-chain factorized scores) or beam search (for neural models); exact decoding is tractable for chain-structured factors.
- Training signal – Supervision is usually full label sequences; training maximizes likelihood (generative HMM) or conditional likelihood (discriminative CRF, neural), or minimizes a margin/structured hinge loss.
Common Applications
- Part-of-speech (POS) tagging – Assigning grammatical category (noun, verb, etc.) to each word in a sentence
- Named entity recognition (NER) – Identifying and typing spans such as person, organization, location (often with BIO/BIOES encoding)
- Chunking (shallow parsing) – Labeling segments such as noun phrase, verb phrase
- Speech recognition (frame-level) – Labeling each frame or segment with a phone or state
- Biological sequence annotation – Labeling residues or bases with function or structure
- Gesture and activity recognition – Labeling each time step with gesture or activity class